De Novo Genome Assembly ◾ 99
--careful \
--isolate \
-o hyb_spades_ecoli_ass
We used “--gzip” with “fastq-dump” command to compress the FASTQ files.
SPADes program has different modes for different applications. With “sapdes.
py” command, we can use “--meta” option for metagenomic mode, “--bio” option for
biosyntheticSPAdes mode, “--corona” option for coronaSPAdes mode, “--rna” option for
transcriptomic mode (RNA-Seq reads), “--plasmid” option for plasmid detection mode,
“--metaviral” option for virus detection mode, “--metaplasmid” option for metagenomic
plasmid detection mode, and “--rnaviral” option for virus assembly mode (from RNA-Seq
reads).
In the following example, we will download Illumine paired-end FASTQ files
“ERR8314890” for the whole genome sequencing of SARS-CoV-2 virus surveillance from
the NCBI SRA database. Then, we will use coronaSPAdes module to assemble SARS-CoV-2
genome. For this exercise, you can create a directory “sarscov2” and change into it as:
makdir sarscov2; cd sarscov2
Then, you can download the FASTQ files from the NCBI SRA database using SRA toolkits
program “fasterq-dump”.
fasterq-dump --verbose ERR8314890
Then, you can run SPAdes program to assemble the SAR-CoV-2 genome using “--corona”
option.
python spades.py \
--pe1-1 ERR8314890_1.fastq \
--pe1-2 ERR8314890_2.fastq \
--corona \
-o sarscov2_genome
The output files including FASTA files of contigs and scaffolds will be saved in the specified
output directory “sarscov2_genome”.
The other SPAdes modes work the same.
3.3 GENOME ASSEMBLY QUALITY ASSESSMENT
After assembling a genome using any of the de novo genome assemblers, the next step is
to assess the quality of assembly to have an idea about how the assembly is good. Genome
assessment metrics provide important information on how the assembly is reliable or not.
There are two approaches for the quality assessment of an assembly. The first one is a sta-
tistical approach that depends on statistical metrics for measuring the quality of a genome